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Relevant events in a three state illness-death model (IDM) of a 
chronic disease are the diagnosis of the disease and death with or with- 
out the disease. In this article a simulation framework for populations 
i— i moving in the IDM is presented. The simulation is closely related 

P-] to the concept of Lexis diagrams in event history analysis. Details 

of the implementation and an example of a hypothetical disease are 
Q described. 
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1 A simple illness-death model 

oo 



A popular framework for studying irreversible diseases is the illness-death 
model (IDM) consisting of three states as depicted in Figur e [Tj N ormal, 



Disease and Death (Keiding 1991; Kalbfleisch and Prentice 2002; Aalen 
Brogan, and Gjessing 2008). The associated transition ratea^J are denoted 



with the symbols as in Figure [T] incidence i, and mortality rates too and 
mi. In general, the rates depend on different time scales: calendar time t, 
^ age a and in case of toi on the duration d of the disease. 

In this article a framework for simulating populations moving in the IDM is 
presented. For each person j, j = 1, . . . , n in the population, the relevant 
events diagnosis and death are simulated. This is accomplished in two steps: 



1. Contracting the disease and dying without the disease is modelled as 

(i) 

competing risk. Given the time t of birth of person j, the cumulative 



Synonymously: densities (in units "per person-year", not to be confused with risks or 
probabilities ( jVandenbroucke and Pearce||2012 1). 
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Figure 1: Three states model of normal, diseased and dead subjects. Tran- 
sition densities may depend on calender time t, age a, and in case of mi also 
on the duration d of the disease. 



(?) (i) 
distribution function iv of the "first failure time" T x is 



(t) = 1 - exp 



4 j) +r, r) + m (C+T, r)dr) . (1) 
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(7) 

The term "first failure time" T x refers to the time of diagnosis or 

(7) 

death without disease. T| is measured in time units after birth of 

(i) 

person j. Thus, T x is the age when the first transition from the 

state Normal occurs. Assumed we know that for person j at a 
transition occurs, then the odds of transiting into state Disease versus 
transiting into state Death is 



(i) rAj) 



1 



m 



(2) 



2. If the event at is the death (without the disease), the simulation 
for person j is finished. If, however, the event is the diagnosis of the 
disease, the "second failure time" T% to death (with disease) has the 



distribution function i 7 ! 
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mi lt%' + T{ J > + r, Tf J + t, t dr 



(3) 



The next section describes in detail how the integrals in Eqs. ([!]) and Q 
are calculated in the implementation of the simulation. After calculating 
the integrals, the question arises how the times T\ and T2 can be obtained 
from F\ and F^. This is done by the inverse transform sampling method: 
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Let F be a cumulative distribution function and u G (0, 1). For := 
inf {x | > it}, it holds: If f7 is a uniform random variable on (0,1), 

then i ?_1 (C/) follows the distribution F. Thus, the simulation of T% and T2 
is easy, if a random number generator for U such as runif in R is available. 

For each of the n persons in the population we store four pieces of data: 

1. a unique identifier j, 

2. the date tffl of birth (dob) of person j, 

3. the age at diagnosis (adi) of person j, and 

4. the age of death (ade) of person j. 

If the person j does not contract the disease, the age at diagnosis adi is set 
to NA (missing). 

In summary, we get the following 



Algorithm 1 Simulation of populations moving in the IDM 
1: for j = 1 to n do 

2: dob <- 4 J) 

3: calculate event time according to Eq. ([!]) 

4: simulate type of event that has happened at T± by Eq. ^ 

5: if event is diagnosis then 

6: adi <- T[ j) 

7: calculate time of death using Eq. (§ 

8: ade <- 1^ + T 2 (i) 

9: else 
10: adi «- NA 
11: ade i- T[ 3) 
12: end if 

13: write j , dob , adi , ade to file 
14: end for 



This allows us to the define the file format for storing the results of Algorithm 
[T| For each person j the four entries j , dob , adi , ade stored in a row of 
an ASCII text file. Delimiter between the four entries are semicolons (;), 
decimal separators are dots (.). The file extension is spd (simulation of 
populations moving in the illnees- death model). 

A few words are devoted to the applications and benefits, such a simulation 
has. The motivation for the algorithm comes from analytical epidemiology 
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where relations between common epidemiological measures are studied. Ex- 
amples for those measures are the prevalence, the duration of a disease, the 
age of onset or diagnosis, and lost life years (due to the disease). Obviously, 
the characteristics of the age of diagnosis can be obtained directly from the 
simulation. A typical question in that respect may be: what is the mean age 
of diagnosis of those subjects born between t s and i e ? Another interesting 
aim is the estimation of the incidence rate i from cross-sectional informa- 
tion. At a specific point in time t' , each of the subjects j = 1, . . . , n, has a 
unique "status". Neglecting those who are unborn or dead at t' , the status 
is either normal (non-diseased) or diseased. Thus, the status can be seen as 
a binary random variable, and data of this kind is typically called current 
status data. The current status is closely linked with the incidence i and the 
mortalities mo and mi. Estimating the incidence from current status data, 



for example, has been a topic in research for decades (Hens, Aerts, Faes, 



Shkedy, Lejeune, van Damme, and Beutels|2010 ) . The framework presented 



here may be useful in this field. 



2 Line integrals in the Lexis diagram 

In this section the calculation of the integrals in Eqs. ([I]) and ^ are de- 
scribed. Since we are interested in integrating arbitrary integrands i,mo, 
and mi, we use numerical integration. We assume that the integrands are 
given by numerical values on a regular grid. In event history analysis ( |Kei-| 



ding 2006), a useful concept is the Lexis diagram, which is a co-ordinate 
system with axes calendar time t (abscissa) and age a (ordinate). The cal- 
endar time dimension sometimes is referred to as period. Each subject is 
represented by a line segment from time and age at entry to time and age 
at exit. Entry and exit may be birth and death, respectively, or entry and 
exit in a epidemiological study or clinical trial. There are excellent and ex- 
tensive introductions about the theory of Lexis diagrams (see for example 



( KeidingllQQO , 1991 Carstensen|2006 ) and references therein), which allows 



to be short here. When it comes to irreversible diseases, the commonly used 
two-dimensional Lexis diagram with axes in time and age direction may be 
generalized to a three-dimensional co-ordinate system with disease duration 
d represented by the applicate (z-axis). If a subject does not get the disease 
during life time, the life line remains in the time-age-plane parallel to the 
line bisecting abscissa and ordinate. With other words, the life line for the 
time without disease is parallel to e\ := (1,1,0)* (where the triple (t,a,d) 
denotes the co-ordinates in time, age and duration direction, respectively). 
However if at a certain point in time E the disease is diagnosed, the life line 
changes its direction, henceforth parallel to e2 := (1, 1, 1)'. The situation 
is illustrated in Figure [2] The life lines of two subjects are shown in the 
three-dimensional Lexis space. At time of birth (denoted B u , ^ = 1,2) both 
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subjects are disease-free; both life lines are parallel to e\. The first subject 
gets the disease at E, and henceforth the life line is parallel to e 2 until death 
at D\. The second subject remains without the disease for the whole life, 
which ends at D 2 . 

d. 




Figure 2: Three-dimensional Lexis diagram with two life lines. Abscissa, 
ordinate and applicate represent calendar time t, age a and duration d, 
respectively. The life lines start at birth B v and end at death D v , v = 1,2. 
The first subject gets the disease at E. Then, the life line changes its 
direction. The second subject does not get the disease, the corresponding 
life line remains in the i-a-plane. 



Having the concept of the Lexis diagram at hand, we observe that F\ and 
F 2 in Eqs. ^ and ^ are line integrals in the Lexis space. We start 
with calculating the first failure times T±. For subject j the associated life 
line starts at (t,a) = (ig , 0). We chose an age u > 0, when it is sure 
that a transition to one of the states Disease or Death has occurred, say 



150 (years). For calculating we trace the hypothetical life line 



from Bj 



(t U) 



0) to Dj 



(4° 



oo, oS). Thus, the hypothetical life line 



has a representation 



Cj : Bj + a- {Dj - Bj), a G [0, 1]. 



As described in ( Brinks||2012 ) following the life line is related to raytracing 
in the field of computer graphics, where efficient algorithms for this purpose 



exist. In Siddon's algorithm (Siddon 1985), the key idea is to follow Cj 
by calculating intersections with volume elements (voxels), which form a 
regular partition of the Lexis space. Let 



4* 



{a {j) (p) \p = l,...,P®} 
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with = a^'(l) < • • • < (jw^pW) = 1 be a parametrization of the points 
where Cj intersects the voxel faces plus the start and and end points Bj and 
Dj. Details for the calculation of A*- are described in (Brinks 2012). The 
parametrization A* is ideally suited for approximating the integral in Eq. 
([I]) by the trapezoidal rule (Dahlquist and Bjorck 1974). The reason lies in 



the fact that in calculating pj (w) the values (t^ + a^\p)uo\ , p 



UTTJj) 



, are a byproduct. Algorithm [2] shows the necessary steps 



Algorithm 2 Calculating F\ 

1: for j = 1 to n do 

2: Calculate A* = {a^(p) \ p = 1, . . . , P^>}. 

3: h^O 

4: n «- 

5: /i^i(4 i} , 0)+m (4 j) , 0) 

6: F^'VO^O 

7: for p = 2 to pO') do 

8: T p ^Ol^\p)-UJ 

9: / p <r- i (t^ + T p , Tp) + m (tjj"^ + T p , T p 

10: £ p <- £p_l + 5 • (Tp - Tp_l) • (/p + /p_l) 

11: ^ (4 J) +t p )<-1- exp(-£ p ) 

12: end for 

13: end for 



Since the values of i and itlq are given on the voxel grid only, the calcula- 
tion of f p , p = 1, . . . , needs bilinear interpolation of the values of the 



adjacent voxels (Press, Flannery, Teukolsky, and Vetterling 1988). 



(i) (i) 
After preparing F^ ,j = 1, . . . , n, the times T x can be calculated by the 

(i) 

inverse transform sampling method. Since we have P x calculated at points 
Cp := 4^ + r p> P = 1> • • • 1 the inverse transform sampling would yield 
only those ( p . A better accuracy can be obtained by interpolating P x 
affine-linearly between consecutive Cp- For t G (Cp-i; Cp); P = 2,...,?^, 
let f := *~ C ;~ 1 . Then, it holds 

(Cp-i) + ^ • P^ (Cp)- 

For those subjects j' who contract the disease, the associated 
can be derived in a similar way as in Algorithm [2j The associated line 
segment starts at (t,a,d) = (4 + , Pp ,0). Again, a hypothetical 
maximal disease duration uj' is assumed, say oJ = 50 (years), such that the 
line segment ends at (t, a, d) = (t^ + t[ 3 ' ] + uj' , T ( / ] + uj' , uj'). Thus, the 
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line segment is parallel to e2 = (1,1,1)*. The Siddon algorithm computes the 
corresponding set of intersections with the voxel grid accordingly. The ages 
T2 ^ of death with disease are obtained from Algorithm [2] mutatis mutandis. 
The interpolation of mi needs to be trilinear. 



3 Example 

This section presents the results of a simulation. In each of sixty consecutive 
years t = 0, . . . , 59, two hundred persons are born and followed from birth 
to death. The incidence of a hypothetical chronic disease is assumed to be 
i(t, a) = , the mortality of the non-diseased is mo(t, a) = exp(— 10.7+ 

O.la + t ln(0.998)) and the mortality of the diseased is m\{t, a, d) = mo(t, a) ■ 
(0.04(d — 5) 2 + 1). In total, 4184 of the 12000 simulated persons contract the 
disease. The simulated data easily allows derivation of important epidemi- 
ological measures. For example, the histograms of the age at onset and age 
at death are shown in Figure [3} 




Figure 3: Histograms of the age at onset (left) and age at death (right) in 
the simulation. 

The median age at death of those who contracted the disease is 77.3 (years) 
whereas the median age at death of those without the disease is 79.6 (years). 
The mean duration of the disease in the 4184 ill subjects is 12.5 (years). 

To cross-check the results of the simulation, we compare them to a theoret- 
ical calculation. In year t = 100, exactly 7368 persons are alive, 799 having 
the hypothetical disease. Figure [4] shows the age-specific prevalence in the 
year 100. The black lines indicate the prevalence of several age groups to- 
gether with 95% confidence bounds as given by the simulation. The blue 
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line represents the prevalence calculated analytically by the exact formula 
in ( Keiding"||1991 Section 7.2). The results agree quite well within the con- 
fidence bounds. 




Age (years) 



Figure 4: Simulated (black) and analytically calculated (blue) prevalence 
of a hypothetical disease. The simulated prevalence is depicted with 95% 
confidence intervals. 



4 Summary 

This article is about simulating populations in an illness-death model con- 
sisting of the three states Normal, Disease, and Death. The disease is as- 
sumed to be irreversible. After birth of an subject in the population, two 
cases may occur: 

1. the subject dies without the disease, or 

2. the subject contracts the disease and dies with the disease. 

In the first case, the life lines of the Lexis diagram are solely located in the t- 
a-plane parallel to e\ = (1, 1, 0)'. In the second case, a part of the life lines is 
parallel to the e2 = (1,1,1)* direction. Changing the direction of the life line 
allows modelling the covariable duration of the disease. In many diseases, 



S 



the duration plays an important role for the mortality. Examples are dis- 



eases related to arteriosclerosis such as diabetes ( 


Carstenson, Kristensen, Ot- 


tosen, and Borch-Johnsen|2008 


) or lupus erythematosus ( 


Bernatsky, Boivin, 


Joseph, Manzi, Ginzler, Gladman, Urowitz, Fortinand, Petri, Barr, Gordon, 


Bae, Isenberg, Zoma, Aranow, 


Dooley, Nived, Sturfelt, Steinsson 


Alarcon, 


Senecal, Zummer, Hanly, Ensworth, Pope, Edworthy, Rahman, Sibley, El- 


Gabalawy, McCarthy, Pierre, Clarke, and Ramsey-Goldman|[2006 


)■ 



The simulation is based on raytracing techniques and provides a fast way to 
follow the individual life lines of subjects in the Lexis diagram. Computa- 
tion time is an issue, because the number of subjects may be large (several 
thousands) . The simulation calculates event times (diagnosis of the disease 
or death) and uses inverse transform sampling via cumulative distribution 
functions. The integrals occurring in the distribution functions are approxi- 
mated by the trapezoidal rule of numerical integration, which ideally fits to 
the raytracing technique. 
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